{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# TAPAS表格问答模型应用开发"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型介绍\n",
    "\n",
    "TAPAS（Tabular Parsing for Question Answering from Structured Data）是一种用于从结构化数据（如表格）中进行问答的模型。它是由Google Research开发的，基于BERT架构。TAPAS模型可以直接处理表格数据，并回答与表格内容相关的问题。\n",
    "\n",
    "论文链接：https://arxiv.org/abs/2004.02349\n",
    "\n",
    "AI Gallery项目地址：https://pangu.huaweicloud.com/gallery/asset-detail.html?id=69dbd529-93e4-4a06-ba4f-242c5e82b56c"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 环境配置\n",
    "1. python=3.9\n",
    "2. mindnlp=0.4.0\n",
    "3. pandas=2.2.3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 导入相关的库\n",
    "Pandas 是一个开源的 Python 数据处理库，提供了高效、便捷的数据结构和数据分析工具。它广泛应用于数据清洗、数据处理、数据分析和数据可视化等领域。其特有的数据结构dataframe可以作为TAPAS模型接收表格参数的形式。mindnlp 库则是基于 MindSpore 框架构建的，专注于自然语言处理任务的工具和模型。提供了u丰富的nlp预训练模型和数据处理工具。\n",
    "版本依赖：mindnlp=0.4.0,pandas=2.2.3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/zhang/miniconda3/envs/text2music/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n",
      "Building prefix dict from the default dictionary ...\n",
      "Loading model from cache /tmp/jieba.cache\n",
      "Loading model cost 0.379 seconds.\n",
      "Prefix dict has been built successfully.\n"
     ]
    }
   ],
   "source": [
    "from mindnlp.transformers import TapasTokenizer, TapasForQuestionAnswering\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 加载预训练模型\n",
    "加载预训练的TAPAS模型和分词器,这里可以根据需求选择用不同大小的数据集微调的预训练模型，如：\n",
    "tapas-large-finetuned-wtq,tapas-base-finetuned-sqa等，这里选用基于wtq数据集微调的标准大小的预训练模型。根据模型大小的不同，可能需要不同的下载时间，请耐心等待。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[MS_ALLOC_CONF]Runtime config:  enable_vmm:True  vmm_align_size:2MB\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/zhang/miniconda3/envs/text2music/lib/python3.10/site-packages/mindnlp/transformers/tokenization_utils_base.py:1526: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted, and will be then set to `False` by default. \n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "model_name = \"google/tapas-base-finetuned-wtq\"\n",
    "model = TapasForQuestionAnswering.from_pretrained(model_name)\n",
    "tokenizer = TapasTokenizer.from_pretrained(model_name,clean_up_tokenization_spaces=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "source": [
    "## 准备表格数据\n",
    "表格用json格式读取，请务必将表格的数据转为字符串类型。并用pandas转为分词器可接收的table，这里选用了一个关于书籍信息的表格。同时定义关于表格的提问，TAPAS模型支持两类型的任务，一种是表格选择，问题的答案是一个单元格的内容，一种是数据聚合，问题的结果是一个数字。在例程中提出的三个问题中，前两个问题属于第一种任务，第三个问题属于第二种任务。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>Author</th>\n",
       "      <th>Year</th>\n",
       "      <th>Category</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>The Great Gatsby</td>\n",
       "      <td>F. Scott Fitzgerald</td>\n",
       "      <td>1925</td>\n",
       "      <td>Fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1984</td>\n",
       "      <td>George Orwell</td>\n",
       "      <td>1949</td>\n",
       "      <td>Dystopian</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>To Kill a Mockingbird</td>\n",
       "      <td>Harper Lee</td>\n",
       "      <td>1960</td>\n",
       "      <td>Fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Pride and Prejudice</td>\n",
       "      <td>Jane Austen</td>\n",
       "      <td>1813</td>\n",
       "      <td>Classic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>The Catcher in the Rye</td>\n",
       "      <td>J.D. Salinger</td>\n",
       "      <td>1951</td>\n",
       "      <td>Fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>War and Peace</td>\n",
       "      <td>Leo Tolstoy</td>\n",
       "      <td>1869</td>\n",
       "      <td>Historical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>The Odyssey</td>\n",
       "      <td>Homer</td>\n",
       "      <td>-800</td>\n",
       "      <td>Epic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Crime and Punishment</td>\n",
       "      <td>Fyodor Dostoevsky</td>\n",
       "      <td>1866</td>\n",
       "      <td>Philosophical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>The Brothers Karamazov</td>\n",
       "      <td>Fyodor Dostoevsky</td>\n",
       "      <td>1880</td>\n",
       "      <td>Philosophical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>One Hundred Years of Solitude</td>\n",
       "      <td>Gabriel García Márquez</td>\n",
       "      <td>1967</td>\n",
       "      <td>Magical Realism</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Brave New World</td>\n",
       "      <td>Aldous Huxley</td>\n",
       "      <td>1932</td>\n",
       "      <td>Dystopian</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>The Lord of the Rings</td>\n",
       "      <td>J.R.R. Tolkien</td>\n",
       "      <td>1954</td>\n",
       "      <td>Fantasy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Animal Farm</td>\n",
       "      <td>George Orwell</td>\n",
       "      <td>1945</td>\n",
       "      <td>Satire</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Fahrenheit 451</td>\n",
       "      <td>Ray Bradbury</td>\n",
       "      <td>1953</td>\n",
       "      <td>Dystopian</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>The Grapes of Wrath</td>\n",
       "      <td>John Steinbeck</td>\n",
       "      <td>1939</td>\n",
       "      <td>Historical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Catch-22</td>\n",
       "      <td>Joseph Heller</td>\n",
       "      <td>1961</td>\n",
       "      <td>Satire</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>The Hobbit</td>\n",
       "      <td>J.R.R. Tolkien</td>\n",
       "      <td>1937</td>\n",
       "      <td>Fantasy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Jane Eyre</td>\n",
       "      <td>Charlotte Brontë</td>\n",
       "      <td>1847</td>\n",
       "      <td>Gothic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Wuthering Heights</td>\n",
       "      <td>Emily Brontë</td>\n",
       "      <td>1847</td>\n",
       "      <td>Gothic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Gone with the Wind</td>\n",
       "      <td>Margaret Mitchell</td>\n",
       "      <td>1936</td>\n",
       "      <td>Historical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>The Scarlet Letter</td>\n",
       "      <td>Nathaniel Hawthorne</td>\n",
       "      <td>1850</td>\n",
       "      <td>Classic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>The Adventures of Huckleberry Finn</td>\n",
       "      <td>Mark Twain</td>\n",
       "      <td>1884</td>\n",
       "      <td>Adventure</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Dracula</td>\n",
       "      <td>Bram Stoker</td>\n",
       "      <td>1897</td>\n",
       "      <td>Horror</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Frankenstein</td>\n",
       "      <td>Mary Shelley</td>\n",
       "      <td>1818</td>\n",
       "      <td>Gothic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>The Picture of Dorian Gray</td>\n",
       "      <td>Oscar Wilde</td>\n",
       "      <td>1890</td>\n",
       "      <td>Gothic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Anna Karenina</td>\n",
       "      <td>Leo Tolstoy</td>\n",
       "      <td>1877</td>\n",
       "      <td>Historical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Les Misérables</td>\n",
       "      <td>Victor Hugo</td>\n",
       "      <td>1862</td>\n",
       "      <td>Historical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Great Expectations</td>\n",
       "      <td>Charles Dickens</td>\n",
       "      <td>1861</td>\n",
       "      <td>Classic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>A Tale of Two Cities</td>\n",
       "      <td>Charles Dickens</td>\n",
       "      <td>1859</td>\n",
       "      <td>Classic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>The Count of Monte Cristo</td>\n",
       "      <td>Alexandre Dumas</td>\n",
       "      <td>1844</td>\n",
       "      <td>Adventure</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>Don Quixote</td>\n",
       "      <td>Miguel de Cervantes</td>\n",
       "      <td>1605</td>\n",
       "      <td>Classic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>Middlemarch</td>\n",
       "      <td>George Eliot</td>\n",
       "      <td>1871</td>\n",
       "      <td>Classic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>The Iliad</td>\n",
       "      <td>Homer</td>\n",
       "      <td>-750</td>\n",
       "      <td>Epic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>The Sound and the Fury</td>\n",
       "      <td>William Faulkner</td>\n",
       "      <td>1929</td>\n",
       "      <td>Fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>The Sun Also Rises</td>\n",
       "      <td>Ernest Hemingway</td>\n",
       "      <td>1926</td>\n",
       "      <td>Fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>Slaughterhouse-Five</td>\n",
       "      <td>Kurt Vonnegut</td>\n",
       "      <td>1969</td>\n",
       "      <td>Satire</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>Beloved</td>\n",
       "      <td>Toni Morrison</td>\n",
       "      <td>1987</td>\n",
       "      <td>Fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>The Color Purple</td>\n",
       "      <td>Alice Walker</td>\n",
       "      <td>1982</td>\n",
       "      <td>Fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>The Handmaid's Tale</td>\n",
       "      <td>Margaret Atwood</td>\n",
       "      <td>1985</td>\n",
       "      <td>Dystopian</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>The Road</td>\n",
       "      <td>Cormac McCarthy</td>\n",
       "      <td>2006</td>\n",
       "      <td>Fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>The Alchemist</td>\n",
       "      <td>Paulo Coelho</td>\n",
       "      <td>1988</td>\n",
       "      <td>Fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>Life of Pi</td>\n",
       "      <td>Yann Martel</td>\n",
       "      <td>2001</td>\n",
       "      <td>Adventure</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>The Kite Runner</td>\n",
       "      <td>Khaled Hosseini</td>\n",
       "      <td>2003</td>\n",
       "      <td>Fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>A Thousand Splendid Suns</td>\n",
       "      <td>Khaled Hosseini</td>\n",
       "      <td>2007</td>\n",
       "      <td>Fiction</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                 Title                  Author  Year  \\\n",
       "0                     The Great Gatsby     F. Scott Fitzgerald  1925   \n",
       "1                                 1984           George Orwell  1949   \n",
       "2                To Kill a Mockingbird              Harper Lee  1960   \n",
       "3                  Pride and Prejudice             Jane Austen  1813   \n",
       "4               The Catcher in the Rye           J.D. Salinger  1951   \n",
       "5                        War and Peace             Leo Tolstoy  1869   \n",
       "6                          The Odyssey                   Homer  -800   \n",
       "7                 Crime and Punishment       Fyodor Dostoevsky  1866   \n",
       "8               The Brothers Karamazov       Fyodor Dostoevsky  1880   \n",
       "9        One Hundred Years of Solitude  Gabriel García Márquez  1967   \n",
       "10                     Brave New World           Aldous Huxley  1932   \n",
       "11               The Lord of the Rings          J.R.R. Tolkien  1954   \n",
       "12                         Animal Farm           George Orwell  1945   \n",
       "13                      Fahrenheit 451            Ray Bradbury  1953   \n",
       "14                 The Grapes of Wrath          John Steinbeck  1939   \n",
       "15                            Catch-22           Joseph Heller  1961   \n",
       "16                          The Hobbit          J.R.R. Tolkien  1937   \n",
       "17                           Jane Eyre        Charlotte Brontë  1847   \n",
       "18                   Wuthering Heights            Emily Brontë  1847   \n",
       "19                  Gone with the Wind       Margaret Mitchell  1936   \n",
       "20                  The Scarlet Letter     Nathaniel Hawthorne  1850   \n",
       "21  The Adventures of Huckleberry Finn              Mark Twain  1884   \n",
       "22                             Dracula             Bram Stoker  1897   \n",
       "23                        Frankenstein            Mary Shelley  1818   \n",
       "24          The Picture of Dorian Gray             Oscar Wilde  1890   \n",
       "25                       Anna Karenina             Leo Tolstoy  1877   \n",
       "26                      Les Misérables             Victor Hugo  1862   \n",
       "27                  Great Expectations         Charles Dickens  1861   \n",
       "28                A Tale of Two Cities         Charles Dickens  1859   \n",
       "29           The Count of Monte Cristo         Alexandre Dumas  1844   \n",
       "30                         Don Quixote     Miguel de Cervantes  1605   \n",
       "31                         Middlemarch            George Eliot  1871   \n",
       "32                           The Iliad                   Homer  -750   \n",
       "33              The Sound and the Fury        William Faulkner  1929   \n",
       "34                  The Sun Also Rises        Ernest Hemingway  1926   \n",
       "35                 Slaughterhouse-Five           Kurt Vonnegut  1969   \n",
       "36                             Beloved           Toni Morrison  1987   \n",
       "37                    The Color Purple            Alice Walker  1982   \n",
       "38                 The Handmaid's Tale         Margaret Atwood  1985   \n",
       "39                            The Road         Cormac McCarthy  2006   \n",
       "40                       The Alchemist            Paulo Coelho  1988   \n",
       "41                          Life of Pi             Yann Martel  2001   \n",
       "42                     The Kite Runner         Khaled Hosseini  2003   \n",
       "43            A Thousand Splendid Suns         Khaled Hosseini  2007   \n",
       "\n",
       "           Category  \n",
       "0           Fiction  \n",
       "1         Dystopian  \n",
       "2           Fiction  \n",
       "3           Classic  \n",
       "4           Fiction  \n",
       "5        Historical  \n",
       "6              Epic  \n",
       "7     Philosophical  \n",
       "8     Philosophical  \n",
       "9   Magical Realism  \n",
       "10        Dystopian  \n",
       "11          Fantasy  \n",
       "12           Satire  \n",
       "13        Dystopian  \n",
       "14       Historical  \n",
       "15           Satire  \n",
       "16          Fantasy  \n",
       "17           Gothic  \n",
       "18           Gothic  \n",
       "19       Historical  \n",
       "20          Classic  \n",
       "21        Adventure  \n",
       "22           Horror  \n",
       "23           Gothic  \n",
       "24           Gothic  \n",
       "25       Historical  \n",
       "26       Historical  \n",
       "27          Classic  \n",
       "28          Classic  \n",
       "29        Adventure  \n",
       "30          Classic  \n",
       "31          Classic  \n",
       "32             Epic  \n",
       "33          Fiction  \n",
       "34          Fiction  \n",
       "35           Satire  \n",
       "36          Fiction  \n",
       "37          Fiction  \n",
       "38        Dystopian  \n",
       "39          Fiction  \n",
       "40          Fiction  \n",
       "41        Adventure  \n",
       "42          Fiction  \n",
       "43          Fiction  "
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {\n",
    "    'Title': [\n",
    "        'The Great Gatsby', '1984', 'To Kill a Mockingbird', 'Pride and Prejudice', 'The Catcher in the Rye',\n",
    "         'War and Peace', 'The Odyssey', 'Crime and Punishment', 'The Brothers Karamazov',\n",
    "        'One Hundred Years of Solitude', 'Brave New World', 'The Lord of the Rings', 'Animal Farm', 'Fahrenheit 451',\n",
    "        'The Grapes of Wrath', 'Catch-22', 'The Hobbit', 'Jane Eyre', 'Wuthering Heights',\n",
    "        'Gone with the Wind', 'The Scarlet Letter', 'The Adventures of Huckleberry Finn', 'Dracula', 'Frankenstein',\n",
    "        'The Picture of Dorian Gray', 'Anna Karenina', 'Les Misérables', 'Great Expectations', 'A Tale of Two Cities',\n",
    "        'The Count of Monte Cristo', 'Don Quixote', 'Middlemarch', 'The Iliad', 'The Sound and the Fury',\n",
    "        'The Sun Also Rises', 'Slaughterhouse-Five', 'Beloved', 'The Color Purple', 'The Handmaid\\'s Tale',\n",
    "        'The Road', 'The Alchemist', 'Life of Pi', 'The Kite Runner', 'A Thousand Splendid Suns'\n",
    "    ],\n",
    "    'Author': [\n",
    "        'F. Scott Fitzgerald', 'George Orwell', 'Harper Lee', 'Jane Austen', 'J.D. Salinger',\n",
    "        'Leo Tolstoy', 'Homer', 'Fyodor Dostoevsky', 'Fyodor Dostoevsky',\n",
    "        'Gabriel García Márquez', 'Aldous Huxley', 'J.R.R. Tolkien', 'George Orwell', 'Ray Bradbury',\n",
    "        'John Steinbeck', 'Joseph Heller', 'J.R.R. Tolkien', 'Charlotte Brontë', 'Emily Brontë',\n",
    "        'Margaret Mitchell', 'Nathaniel Hawthorne', 'Mark Twain', 'Bram Stoker', 'Mary Shelley',\n",
    "        'Oscar Wilde', 'Leo Tolstoy', 'Victor Hugo', 'Charles Dickens', 'Charles Dickens',\n",
    "        'Alexandre Dumas', 'Miguel de Cervantes', 'George Eliot', 'Homer', 'William Faulkner',\n",
    "        'Ernest Hemingway', 'Kurt Vonnegut', 'Toni Morrison', 'Alice Walker', 'Margaret Atwood',\n",
    "        'Cormac McCarthy', 'Paulo Coelho', 'Yann Martel', 'Khaled Hosseini', 'Khaled Hosseini'\n",
    "    ],\n",
    "    'Year': [\n",
    "        '1925', '1949', '1960', '1813', '1951',\n",
    "        '1869', '-800', '1866', '1880',\n",
    "        '1967', '1932', '1954', '1945', '1953',\n",
    "        '1939', '1961', '1937', '1847', '1847',\n",
    "        '1936', '1850', '1884', '1897', '1818',\n",
    "        '1890', '1877', '1862', '1861', '1859',\n",
    "        '1844', '1605', '1871', '-750', '1929',\n",
    "        '1926', '1969', '1987', '1982', '1985',\n",
    "        '2006', '1988', '2001', '2003', '2007'\n",
    "    ],\n",
    "    'Category': [\n",
    "        'Fiction', 'Dystopian', 'Fiction', 'Classic', 'Fiction',\n",
    "        'Historical', 'Epic', 'Philosophical', 'Philosophical',\n",
    "        'Magical Realism', 'Dystopian', 'Fantasy', 'Satire', 'Dystopian',\n",
    "        'Historical', 'Satire', 'Fantasy', 'Gothic', 'Gothic',\n",
    "        'Historical', 'Classic', 'Adventure', 'Horror', 'Gothic',\n",
    "        'Gothic', 'Historical', 'Historical', 'Classic', 'Classic',\n",
    "        'Adventure', 'Classic', 'Classic', 'Epic', 'Fiction',\n",
    "        'Fiction', 'Satire', 'Fiction', 'Fiction', 'Dystopian',\n",
    "        'Fiction', 'Fiction', 'Adventure', 'Fiction', 'Fiction'\n",
    "    ]\n",
    "}\n",
    "table = pd.DataFrame.from_dict(data)\n",
    "\n",
    "questions = [\"Who is the author of The Lord of the Rings?\",\"which book published on 1987?\",\"How many books belonging to Adventure in sum?\"]\n",
    "table\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型推理\n",
    "使用编码器对表格和问题进行编码，并使用模型进行推理。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/zhang/miniconda3/envs/text2music/lib/python3.10/site-packages/mindnlp/transformers/models/tapas/tokenization_tapas.py:2663: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`\n",
      "  text = normalize_for_match(row[col_index].text)\n",
      "/home/zhang/miniconda3/envs/text2music/lib/python3.10/site-packages/mindnlp/transformers/models/tapas/tokenization_tapas.py:1462: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`\n",
      "  cell = row[col_index]\n"
     ]
    }
   ],
   "source": [
    "inputs = tokenizer(table=table, queries=question, return_tensors=\"ms\",padding=\"max_length\")\n",
    "\n",
    "input_ids=inputs[\"input_ids\"]\n",
    "attention_mask=inputs[\"attention_mask\"]\n",
    "token_type_ids=inputs[\"token_type_ids\"]\n",
    "\n",
    "outputs = model(input_ids=input_ids,attention_mask=attention_mask,token_type_ids=token_type_ids)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 结果解析\n",
    "对预测结果进行解析，并对问题做出回答。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Question: Who is the author of The Lord of the Rings?\n",
      "Answer: J.R.R. Tolkien\n",
      "Question: which book published on 1987?\n",
      "Answer: Beloved\n",
      "Question: How many books belonging to Adventure in sum?\n",
      "Answer: 3 ['The Adventures of Huckleberry Finn', 'The Count of Monte Cristo', 'Life of Pi']\n"
     ]
    }
   ],
   "source": [
    "# 获取预测结果\n",
    "predicted_answer_coordinates, predicted_aggregation_indices = tokenizer.convert_logits_to_predictions(\n",
    "    inputs,\n",
    "    outputs.logits,\n",
    "    outputs.logits_aggregation\n",
    ")\n",
    "\n",
    "# 解析预测结果\n",
    "answers = []\n",
    "numbers=[]\n",
    "for coordinates in predicted_answer_coordinates:\n",
    "    if len(coordinates) == 1:\n",
    "        # 单个单元格答案\n",
    "        answers.append(table.iat[coordinates[0]])\n",
    "    else:\n",
    "        # 多个单元格答案\n",
    "        cell_values = []\n",
    "        for coordinate in coordinates:\n",
    "            cell_values.append(table.iat[coordinate])\n",
    "        answers.append(cell_values)\n",
    "\n",
    "for num in predicted_aggregation_indices:\n",
    "    numbers.append(num)\n",
    "\n",
    "# 打印答案\n",
    "for i in range(len(questions)):\n",
    "    print(\"Question:\", questions[i])\n",
    "    #n判断是否为聚合型问题\n",
    "    if numbers[i]!=0:\n",
    "        print(\"Answer:\",numbers[i],answers[i])\n",
    "    else:\n",
    "        print(\"Answer:\", answers[i])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "text2music",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
